Goto

Collaborating Authors

 boolean task algebra


A Boolean Task Algebra for Reinforcement Learning

Neural Information Processing Systems

The ability to compose learned skills to solve new tasks is an important property for lifelong-learning agents.



Review for NeurIPS paper: A Boolean Task Algebra for Reinforcement Learning

Neural Information Processing Systems

Additional Feedback: Some ideas on how to relax the restrictive assumptions: The relationship to UVFAs is intriguing, and may potentially lead to a means of applying an approximate version of the results of this paper to more complex settings. For example, what happens if one applies the Boolean operators on value functions to UVFAs? While it's probably possible to construct MDPs in which this won't work, it seems plausible that for sparse enough reward settings one might obtain good value function approximations. I also wonder if it might be possible to apply these results to the setting of van Niekirk et al., which appears somewhat looser in the nature of the MDP transition dynamics and the reward function. A couple of points remain that I feel weren't fully addressed by the rebuttal.


Review for NeurIPS paper: A Boolean Task Algebra for Reinforcement Learning

Neural Information Processing Systems

All reviewers support acceptance for the contributions, namely the development of a boolean task algebra for reinforcement learning, a clear theoretical and empirical analysis, and efficient zero-shot transfer by task composition when the problem structure is amenable. Please consider revising your paper to address the concerns raised in the reviews and rebuttal, in particular the comments on the restrictive assumptions.


A Boolean Task Algebra for Reinforcement Learning

Neural Information Processing Systems

The ability to compose learned skills to solve new tasks is an important property for lifelong-learning agents. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains---including a high-dimensional video game environment requiring function approximation---where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.